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Risk-Sensitive Mean Field Games 

Hamidou Tembine, Quanyan Zhu, Tamer Ba§ar 



Abstract 

In this paper, we study a class of risk-sensitive mean-field stochastic differential games. We show 
that under appropriate regularity conditions, the mean-field value of the stochastic differential game with 
exponentiated integral cost functional coincides with the value function described by a Hamilton-Jacobi- 
Bellman (HJB) equation with an additional quadratic term. We provide an explicit solution of the mean- 
field best response when the instantaneous cost functions are log-quadratic and the state dynamics are 
affine in the control. An equivalent mean-field risk-neutral problem is formulated and the corresponding 
mean-field equiUbria are characterized in terms of backward-forward macroscopic McKean-Vlasov 
equations, Fokker-Planck-Kolmogorov equations, and HJB equations. We provide numerical examples 
on the mean field behavior to illustrate both linear and McKean-Vlasov dynamics. 

I. Introduction 

Most fomiulations of mean-field (MF) models such as anonymous sequential population games 
Ell, 0, MF stochastic controls EH, O, Ol, MF optimization, MF teams [[331, MF stochastic 
games [|34l . [[Tl, [|33l . [|3TI . MF stochastic difference games fT4l, and MF stochastic differential 
games [|23ll . [fT3l . [|32l have been of risk-neutral type where the cost (or payoff, utility) functions 
to be minimized (or to be maximized) are the expected values of stage- additive loss functions. 
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Not all behavior, however, can be captured by risk-neutral cost functions. One way of captur- 
ing risk-seeking or risk-averse behavior is by exponentiating loss functions before expectation 
(see All, [fTSll and the references therein). 

The particular risk-sensitive mean-field stochastic differential game that we consider in this 
paper involves an exponential term in the stochastic long-term cost function. This approach was 
first taken by Jacobson in [[T8l . when considering the risk- sensitive Linear-Quadratic-Gaussian 
(LQG) problem with state feedback. Jacobson demonstrated a link between the exponential cost 
criterion and deterministic linear-quadratic differential games. He showed that the risk-sensitive 
approach provides a method for varying the robustness of the controller and noted that in the 
case of no risk, or risk-neutral case, the well known LQR solution would result (see, for follow- 
up work on risk-sensitive stochastic control problems with noisy state measurements, [i35l . O, 

mi). 

In this paper, we examine the risk-sensitive stochastic differential game in a regime of large 
population of players. We first present a mean-field stochastic differential game model where the 
players are coupled not only via their risk-sensitive cost functionals but also via their states. The 
main coupling term is the mean-field process, also called the occupancy process or population 
profile process. Each player reacts to the mean field or a subset of the mean field generated by 
the states of the other players in an area, and at the same time the mean field evolves according 
to a controlled Kolmogorov forward equation. 

Our contribution can be summarized as follows. Using a particular structure of state dynamics, 
we derive the mean-field limit of the individual state dynamics leading to a non-linear controlled 
macroscopic McKean-Vlasov equation; see [21 J. Combining this with a limiting risk-sensitive 
cost functional, we arrive at the mean-field response framework, and establish its compatibility 
with the density distribution using the controlled Fokker-Planck-Kolmogorov forward equation. 
The mean-field equilibria are characterized by coupled backward-forward equations. In general a 



backward-forward system may not have solution (a simple example is provided in section III-D). 
An explicit solution of the Hamilton- Jacobi-Bellman (HJB) equation is provided for the affine- 
exponentiated-Gaussian mean-field problem. An equivalent risk-neutral mean-field problem (in 
terms of value function) is formulated and the solution of the mean-field game problem is 
characterized. Finally, we provide a sufficiency condition for having at most one smooth solution 
to the risk-sensitive mean field system in the local sense. 
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The rest of the paper is organized as follows. In Section |II| we present the model description. 

we 



We provide an overview of the mean-field convergence result in Section II-A In Section III 



present the risk-sensitive mean-field stochastic differential game formulation and its equivalences. 



In Section IV we analyze a special class of risk-sensitive mean-field games where the state 



dynamics are linear and independent of the mean field. In Section |V} we provide a numerical 



example, and section VI concludes the paper. An appendix includes proofs of two main results 
in the main body of the paper. We summarize some of the notations used in the paper in Table 



II. The problem setting 

We consider a class of n— person stochastic differential games, where Player j's individual 
state, x'j, evolves according to the Ito stochastic differential equation (S) as follows: 

i=l i=l (S) 

2;^"(0) = Xj^e X CR'', k>l,j e {l,...,n}, 

where x"(t) is the fc-dimensional state of Player j; u^{t) E Uj, is the control of Player j at 
time t with Uj being a subset of the -dimensional Euclidean space M^^; Mj(t) are mutually 
independent standard Brownian motion processes in M}; and e is a small positive parameter, 
which will play a role in the analysis in the later sections. We will assume in (S) that there is 
some symmetry in fji and aji, in the sense that there exist / and a (conditions on which will 
be specified shortly) such that for all j and i, 

f,,{t,x]{t),u]{t),xnt)) ^ f{t,x]{t),u]{t),xnt)) 

and 

aj,{t, x]it), u]it),xnt)) ^ ait, x]it), u]it), x^) ■ 

The system (S) is a controlled McKean-Vlasov dynamics. Historically, the McKean-Vlasov 
stochastic differential equation (SDE) is a kind of mean field forward SDE suggested by Kac 
in 1956 as a stochastic toy model for the Vlasov kinetic equation of plasma and the study of 
which was initiated by McKean in 1966. Since then, many authors have made contributions to 
McKean-Vlasov type SDEs and related applications [|20l . [[TOl . 
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TABLE I 

Summary of Notations 



Symbol Meaning 



/ drift function (finite dimensional) 

o diffusion function (finite dimensional) 

x^{t) state of Player j in a population of size n 

Xj(t) solution of macroscopic McKean-Vlasov equation 

Xj{t) limit of state process 

Uj space of feasible control actions of Player j 

7j state feedback strategy of Player j 

7j individual state-feedback strategy of Player j 

Vj set of admissible state feedback strategies of Player j 

Tj set of admissible individual state-feedback strategies of Player j 

Uj control action of Player j under a generic control strategy 

c instantaneous cost function 

g terminal cost function 

5 risk-sensitivity index 

Bj standard Brownian motion process for Player j's dynamics 

E Expectation operator 

L risk-sensitive cost functional 

dx partial derivative with respect to x (gradient) 

second partial derivative (Hessian operator) with the respect to x 

x' transpose of x 

m" empirical measure of the states of the players 

mt limit of m" when n — >■ oo 

m" limit of m" when t — )■ oo 

tr(Af) trace of a square matrix M, i.e., tr(M) :— Ma. 



A B A — B is positive definite, where A, B are square symmetric matrices of the same dimension. 



The uncontrolled version of state dynamics (S) captures many interesting problems involving 
interactions between agents. We list below a few examples. 

Example 1 (Stochastic Kuramoto model). Consider n oscillators where each of the oscillators is 
considered to have its own intrinsic natural frequency uj, and each is coupled symmetrically to 
all other oscillators. For fji{xi, Ui, Xj) = f{xi, Ui, xj) = K sin(a;j — Xi) + ujj and aji a constant 
in (S), the state dynamics without control is known as (stochastic) Kuramoto oscillator [22] 
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where the goal is convergence to some common value (consensus) or alignment of the players' 
parameters. The stochastic Kuramoto model is given by 

d9j{t) = (^j{t) + ^Y^sm{ei{t) - 9jit))^ dt + DdMj{t), 

where D,K > 0. 

Example 2 (Stochastic Cucker-Smale dynamics:). Consider a population, say of birds or fish 
that move in the three dimensional space. It has been observed that for some initial conditions, 
for example on their positions and velocities, the state of the fiock converges to one in which 
all birds fiy with the same velocity. See, for example, Cucker-Smale flocking dynamics /19|/, ^ 
where each vector Xi = {yi,Vi) is composed of position dynamics and velocity dynamics of 
the corresponding player For f{xi,Ui,Xj) = (e^+ || xj — Xi \\'^)'"c{xj — Xi) in (S), where 
e > 0, a > and c(-) is a continuous function, one arrives at a generic class of consensus 
algorithms developed for flocking problems. 

Example 3 (Temperature dynamics for energy-efficient buildings). Consider a heating system 
serving a flnite number of zones. In each zone, the goal is to maintain a certain temperature. 
Denote by Tj the temperature of zone j, and by T*^^* the ambient temperature. The law of 
conservation of energy can be written down as the following equation for zone j, 



dTj{t) = adMj{t) + 



dt. 



r,{t) + j{r^\t) - T,{t)) + ^a,,{t)m) - m) 

where rj denotes the heat input rate of the heater in zone j, 7, /3 > 0, aij is the thermal 
conductance between zone i and zone j and o is a small variance term. The evolution of the 
temperature has a McKean-Vlasov structure of the type in system (S). We can introduce a control 
variable into rj such that the heater can be turned on and off in each zone. 

The three examples above can be viewed as special cases of the system (S). The controlled 
dynamics in (S) allows one to address several interesting questions. For example, how to control 
the flocking dynamics and consensus algorithms of the first two examples above to a certain 
target? How to control the temperature in the third example in order to achieve a specific thermal 
comfort while minimizing energy cost? In order to define the controlled dynamical system in 
precise terms, we have to specify the nature of information that players are allowed in the choice 
of their control at each point in time. This brings us to the first definition below. 
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Definition 1. A state-feedback strategy for Player j is a mapping 7^ : M.^ x (M^)" — > Uj, 
whereas an individual state-feedback strategy for Player j is a mapping : IR+ x M'^ — Uj. 

Note that the individual state-feedback strategy involves only the self state of a player, whereas 
the state-feedback strategy involves the entire n/c— dimensional state vector. The individual 
strategy spaces in each case have to be chosen in such a way that the resulting system of 
stochastic differential equations (S) admits a unique solution (in the sense specified shortly) 
when the players pick their strategies independently; furthermore, the feasible sets are time 
invariant and independent of the controls. We denote by Tj the set of such admissible control 
laws 7j : [0,T] x — )• Uj for Player j; a similar set, Tj, can be defined for state-feedback 
strategies 7^. 

We assume the following standard conditions on f,a,^j and the action sets Uj, for all j = 
1,2, ■■ ■ ,n. 

(i) / is in {t,x,u,m), and Lipschitz in {x,u,m). 

(ii) The entries of the matrix a are and aa' is strictly positive; 

(iii) /, dxf are uniformly bounded; 

(iv) Uj is non-empty, closed and bounded; 

(v) 7j : [0, T] X R.'^ — > Uj is piecewise continuous in t and Lipschitz in x. 

Normally, when we have a cost function for Player j, which depends also on the state variables 
of the other players, either directly, or implicitly through the coupling of the state dynamics (as 
in (S)), then any state-feedback Nash equilibrium solution will generally depend not only on 
self states but also on the other states, i.e., it will not be in the set Tj, j = 1, ■■■ ,n. However, 
this paper aims to characterize the solution in the high-population regime (i.e., as n — )■ 00) in 
which case the dependence on other players' states will be through the distribution of the player 
states. Hence each player will respond (in an optimal, cost minimizing manner) to the behavior 
of the mass population and not to behaviors of individual players. Validity of this property will 



be established later in Section III of the paper, but in anticipation of this, we first introduce the 
quantity 

1 " 

as an empirical measure of the collection of states of the players, where 5 is a Dirac measure 
on the state space. This enables us to introduce the long-term cost function of Player j (to be 

October 11, 2012 DRAFT 



7 



minimized by him) in terms of only the self variables (xj and Uj) and m",t > 0, where the 
latter can be viewed as an exogenous process (not directly influenced by Player j). But we first 
introduce a mean-field representation of the dynamics (S), which uses m" and will be used in 
the description of the cost. 

A. Mean-field representation 

The system (S) can be written into a measure representation using the formula 



where G A" is a Dirac measure concentrated at z, (p h a. measurable bounded function 
defined on the state space and Ui E M. Then, the system (S) reduces to the system 



The above representation of the system (SM) can be seen as a controlled interacting particles 
representation of a macroscopic McKean-Vlasov equation where m" represents the discrete 
density of the population. Next, we address the mean field convergence of the population profile 
process m". To do so, we introduce the key notion of indistinguishability. 

Definition 2 (Indistinguishability). We say that a family of processes (x",a;2, . . . ,xJJ) is indis- 
tinguishable ( or exchangeable ) if the law of a;" is invariant by permutation over the index set 



The solution of (S) obtained under fixed control u(-) generates indistinguishable processes. For 
any permutation tt over {1,2,..., n}, one has C{x'^^, . . . , x"^) = . . . , a;"^-^)), where 





(SM) 



Xj"(0) 



{l,...,n}. 
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£(X) denotes the law of the random variable X. For indistinguishable (exchangeable) processes, 
the convergence of the empirical measure has been widely studied (see [|29l and the references 
therein). To preserve this property for the controlled system we restrict ourselves to admissible 
homogeneous controls. Then, the mean field convergence is equivalent to the existence of a 
random measure jj, such that the system is yU— chaotic, i.e.. 



lim I n0K^->"(rf^") = n ( / '^'^'^ 
i=\ 1=1 



for any fixed natural number L > 2 and a collection of measurable bounded functions {(pi}i<i<L 
defined over the state space X. Following the indistinguishability property, one has that the law 
of x" = {x^{t), t > 0) is E[m"]. The same result is obtained by proving the weak convergence of 
the individual state dynamics to a macroscopic McKean-Vlasov equation (see later Proposition [5]). 
Then, when the initial states are i.i.d. and given some homogeneous control actions u, the solution 
of the state dynamics generates an indistinguishable random process and the weak convergence 
of the population profile process m" to fi is equivalent to the /i— chaoticity. For general results 
on mean-field convergence of controlled stochastic differential equations, we refer to [fT4l . These 
processes depend implicitly on the strategies used by the players. Note that an admissible control 
law 7 may depend on time t, the value of the individual state Xj{t) and the mean-field process 
rut. The weak convergence of the process m" implies the weak convergence of its marginal 
and one can characterize the distribution of rrit by the Fokker-Planck-Kolmogorov (FPK) 
equation: 

dtmt + Dlimt / f {t , x , u{t) , w)mt{dw] 

\ J w 

= ^Dl^i^ti^j (T'{t,x,u{t),w)mt{dw) j ■ a{t,x,u{t),w)mt{dw) ] ] . (2) 
Here /(■) e M'^', which we denote by {fk'i-))i<k'<k, where fk' is scalar. We let 

a[t, x,u{t),mt\ := / (7{t, x,u{t),w)mt{dw), 

J w 

r(-) '■= g_{-)(y!{-) is a square matrix with dimension k x k. The term -D^(-) denotes 

( [ fk'{t,x,u{t),w)mt{dw) 
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and the last term on Dl^{-) is 

k k g2 

k"=i k'=i 

In the one-dimensional case, the terms D^, reduce to the divergence "div" and the Laplacian 
operator A, respectively. 

It is important to note that the existence of a unique rest point (distribution) in FPK does not 
automatically imply that the mean-field converges to the rest point when t goes to infinity. This 
is because the rest point may not be stable. 

Remark 1. In mathematical physics, convergence to an independent and identically distributed 
system is sometimes referred to as chaoticity ^2Ml . l\29\l . ^TTj . and the fact that chaoticity at the 
initial time implies chaoticity at further times is called propagation of chaos. However in our 
setting the chaoticity property needs to be studied together with the controls of the players. In 
general the chaoticity property may not hold. One particular case should be mentioned, which is 
when the rest point m* is related to the 6m*— chaoticity. If the mean-field dynamics has a unique 
global attractor m*, then the propagation of chaos property holds for the measure 6^* ■ Beyond 
this particular case, one may have multiple rest points but also the double limit, lim„ lim^ m" 
may differ from the one when the order is swapped, lim^ lim„ leading a non-commutative 
diagram. Thus, a deep study of the underlying dynamical system is required if one wants to 
analyze a performance metric for a stationary regime. A counterexample of non-commutativity 
of the double limit is provided in / l30l/ . ■ 

B. Cost Function 

We now introduce the cost functions for the differential game. Risk-sensitive behaviors can 
be captured by cost functions which exponentiate loss functions before the expectation operator. 
For each t G [0,T], and m^,Xj initialized at a generic feasible pair m,x at t, the risk-sensitive 
cost function for Player j is given by 

/ r 

Mxt)+ / c{s,x]is),u]{s),m''{s)) ds] 

e Xj (t) = X , m" = m 



v 



where c(-) is the instantaneous cost at time s; g{-) is the terminal cost; 5 > is the risk- 
sensitivity index; mjjyj denotes the process {rn^^t < s < T}; and = ^j{s, x'j{s),m'^{s)), 
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with 7j G Tj. Note that because of the symmetry assumption across players, the cost function of 
Player j is not indexed by j, since it is in the same structural form for all players. This is still 
a game problem (and not a team problem), however, because each such cost function depends 
only on the self variables (indexed by j for Player j) as well as the common population variable 
m". 

We assume the following standard conditions on c and g. 

(vi) c is in {t,x,u,m); g is in x; c,g are non-negative; 

(vii) c, dxC, g, d^g are uniformly bounded. 

The cost function ([3]) is called the risk-sensitive cost functional or the exponentiated integral 
cost, which measures risk-sensitivity for the long-run and not at each instant of time (see [18], 
||35]| . [|6l, [|2l). We note that the McKean-Vlasov mean field game considered here differs from 
the model in [|T6l : specifically, in this paper, the volatility term in (SM) is a function of state, 
control and the mean field, and further, the cost functional is of the risk- sensitive type. 

Remark 2 (Connection with mean-variance cost). Consider the function : A i — > j log(Ee'^'"). 
It is obvious that the risk-sensitive cost c'^ takes into consideration all the moments of the cost 
C, and not only its mean value. Around zero, the Taylor expansion of c'^ is given by 

c^^E(C) + ^var(C) + o(A), 

where the important terms are the mean cost and the variance of the cost for small A. Hence 
risk- sensitive cost entails a weighted sum of the mean and variance of the cost, to some level 
of approximation. 

With the dynamics (SM) and cost functionals as introduced, we seek an individual state- 
feedback non-cooperative Nash equilibrium {7*,i G {1, ■ ■ ■ ,^}}, satisfying the set of inequali- 
ties 

"^fo,T]; 0, Xj- 0, JR) < L{^j,rh'J^^j.^;0, xj^, m), (4) 

for all 7j G Tj,j G {1, 2, ■ ■ ■ , n}, where m"[0, T] is generated by the 7*'s, and rfi^Qj,^ by (7, 7* j), 
7*^ = {7*, « = 1, 2, ■ ■ ■ , n, i 7^ j}; u* and Uj are control actions generated by control laws 7* 
and 7j, respectively, i.e., u* = ^*(t,Xj) and uj = 'yj{t,Xj); = m^[u*] laws are given by 
forward FPK equation under the strategy 7*, and m"'"' = mf'-'[uj,u*_j] is the induced measure 
under the strategy (7j,7*j)- 

October 11, 2012 DRAFT 



11 

A more stringent equilibrium solution concept is that of strongly time-consistent individual 
state-feedback Nash equilibrium satisfying, 

for all Xj eX,te [0, T), 7^- G f j G {1, 2, ■ ■ ■ , n}. 

Note that the two measures m" and m^'"' differ only in the component j and have a common 
term which is ^ (^a;",(t), which converges in distribution to some measure with a distribution 

that is a solution of the forward PFK partial differential equation. 

III. Risk-sensitive best response to mean-field and equilibria 

In this section, we present the risk-sensitive mean-field results. We first provide an overview of 
the mean-field (feedback) best response for a given mean-field trajectory = (m"(s), s > 0). 
A mean-field best-response strategy of a generic Player j to a given mean field m" is a measurable 
mapping 7^ satisfying: V 7^ G tj, with Xj and initialized at Xjfl,m, respectively, 

where law of m" is given by the forward FPK equation in the whole space A"", and is an exoge- 
nous process. Let v'^{t,Xj,m) = iniuj L{uj,m^Qj,^,t,Xj,m). The next proposition establishes 
the risk-sensitive Hamilton- Jacobi-Bellman (HJB) equation of the risk-sensitive cost function 
satisfied by a smooth optimal value function of a generic player. The main difference from the 
standard HJB equation is the presence of the term ^ || ad^^v'^ |p . 

Proposition 1. Suppose that the trajectory of m" is given. If t>" is twice continuously differen- 
tiable, then is solution of the risk-sensitive HJB equation 

dtv" + inf {/ ■ d^V^ + '-tr{aa'dl^,^v]) + ^ || ad^^v^ \\' +c} = 0, 

v'^{T,Xj) = g{xj). 

Moreover, any strategy satisfying 

7;(-) G argniin {/ • 9,.^" + '-tr{aa'dl.,.v^) + ^ || ad^.v'^ f +c} , 
constitutes a best response strategy to the mean-field m". 
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Proof of Proposition [7]- For feasible initial conditions x and m, we define 

0"(t,x,m) := inf E ('e7fe(^T)+/f c(.,x"(.),«,(t),mj) ds] | _ ^^^n _ ^^ _ 

It is clear that v'^{t, Xj,m) = inf L = S log (p"' {t, Xj,m). Under the regularity assumptions of 
Section the function 0" is in t and in x. Using Ito's formula, 

Using the Ito-Dynkin's formula (see [|26l . (61 , [GTI ). the dynamic optimization yields 



inf{d0" + ^c0"rft} = 0. 



Thus, one obtains 



+ inf U ■ + ^tr(aa'9L0") + \cq 



To establish the connection with the risk-sensitive cost value, we use the relation 0" = e^' 
One can compute the partial derivatives: 

and 

where the latter immediately yields 

Combining together and dividing by we arrive at the HJB equation 



Remark 3. Let us introduce the Hamiltonian H as 

Hit, x,p, M) = inf |p ■ / + Ur{aa'M) + ^ || f +c} , 

for a vector p and a matrix M which is the same as the Hessian of v^. 
If a does not depend on the control, then the above expression reduces to 

mi{p ■/ + €} + Ur{aa'M) + ap f , 
u 2, 2,0 
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and the term to be minimized is H'^{t,x,p, M) = miu{p ■ / + c}, which is related to the 
Legendre-Fenchel transform for linear dynamics, i.e., the case where f is linear in the control 
u. 

In that case, 

dpH^{t,x,p,M) = au* 

for some non-singular a of proper dimension. This says that the derivative of the modified 
Hamiltonian is related to the optimal feedback control. Now, for non-linear drift f the same 
technique can be used but the function f needs to be inverted to obtain a generic closed form 
expression the optimal feedback control and is given by 

u* = ~g-\dpH'it,x,p,M)), 

where g^^ is the inverse of the map 

u I — > f(t, X, u, m). 

This generic expression of the optimal control will play an important role in non-linear McKean- 
Vlasov mean field games. 

The next proposition provides the best-response control to the affine-quadratic in w-exponentiated 
cost-Gaussian mean-field game, and the proposition that follows that deals with the case of affine- 
quadratic in both u and x. 

Proposition 2. Suppose a{t,x) = a(t) and 

f(t,Xj,Uj,m) = f{t,Xj,m)+B(t,Xj,'m)uj, 

c{t, Xj,Uj,m) = c{t, Xj,m)+ \\ uj ||^ . 
Then, the best-response control of Player j is 7"'* = —^Bd^^v"'. 
Proof: Following Proposition [T| we know 

-n,* ^ ^".*(.) g aTgmm{c{t,Xj{t),Uj{t),mt) + f{t,Xj{t),Uj,mt) ■ 
With the assumptions on a, /, c, g, the condition reduces to 

aigmm {[f + Buj]dx v"' + c+ \\ uj ||^| . 
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and hence, we obtain 7J'* = —^Bd^^v^ by convexity and coercivity of the mapping Uj 1 — > 
[f + Buj\d^^v'' + c+ \\ Uj f . ■ 

Proposition 3 (Explicit optimal control and cost, 

Consider the risk-sensitive mean-field stochastic game described in Proposition |2] with f = 
A(t)x, B a constant matrix, c = x'Q(t)x, Q(t) > 0, g{x) = x'Qtx,Qt > 0, where the 
symmetric matrix Q{ ) is continuous. Then, the solution to HJB equation in Proposition [7] 
(whenever it exists) is given by = x'Z{t)x + e tr{Z{s)aa') ds. where Z{s) is the 

nonnegative definite solution of the generalized Riccati differential equation 

Z + AZ + ZA + Q- Z (^BB' - ^^^'^ Z = 0, Z{T) 

where p = (^)^^^ and the optimal response strategy is 



Qi 



u*{t) = r,i-) = -B'Zx. (6) 

Using Proposition [3| one has the following result for any given trajectory (m")f>o, which 
enters the cost function in a particular way. 

Proposition 4. Ifc is in the form c = x'{Q(t)—A(t, m"))x, where A is symmetric and continuous 
in {t, m), then the generalized Riccati equation becomes 

Z* + A'Z* + Z*A + Q- A(t, ml) - Z* ( BB' - \cra'] Z* = 0, Z*{T) = Qt, 



and 



v'^{t, x) = x'Z*x + e tr{Z*{s)(T(T') ds. 



A. Macroscopic McKean-Vlasov equation 

Since the controls used by the players influence the mean-field limit via the state dynamics, we 
need to characterize the evolution of the mean-field limit as a function of the controls. The law 
of nit is the solution of the Fokker-Planck-Kolmogorov equation given by ([2]) and the individual 
state dynamics follows the so-called macroscopic McKean-Vlasov equation 

dxj{t) = ( I f{t,Xj{t),u*At),w)mt{dw))dt + y/el / a{t,Xj{t),u*At),w)mt{dw)] dMj{t). (7) 
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In order to obtain an error bound, we introduce the following notion: Given two measures /x 
and V the Monge-Kontorovich metric (also called Wasserstein metric) between /x and v is 

>Vi(/i,i/)= inf E|X-y|. 

In other words, let -^(/U, v) be the set of probability measures P on the product space such that 
the image of P under the projection on the first argument (resp. on the second argument) is /i 
(resp. z/). Then, 



>Vi(/x,z/)= inf / \ \z- z'mdz.dz). (8) 

This is known indeed as a distance (it can be checked that the separation, the triangle inequality 
and positivity properties are satisfied) and it metricizes the weak topology. 

Proposition 5. Under the conditions (i)-(vii), the following holds: For any t, if the control law 
7*(-) is used, then there exists yt > such that 

E(|| x]{t)-3:,{t) II) 

Moreover, for any T < oo, there exists Ct > such that 

Wi {C{{x]{t)),,[o,T]), C{{x,m,^o,T])) < ^, (9) 
where C{Xt) denotes the law of the random variable Xt. 

The last inequality says that the error bound is at most of O(^) for any fixed compact interval. 
The proof of this assertion follows the following steps: Let x"(t) and Xj{t) be the solutions of the 
two SDEs with initial gap less than Then, we take the difference between the two solutions. 
In a second step, use triangle inequality of norms and take the expectation. Gronwall inequality 
allows one to complete the proof. A detailed proof is provided in the Appendix. 

1) Risk-sensitive mean-field cost: Based on the fact that m" converges weakly to rrit under 
the admissible controls (u"(s), s > 0) — )■ {uj{s), s > 0) when n goes to infinity, one can 
show the weak convergence of the risk-sensitive cost function ([3]) under the regularity conditions 
(vi) and (vii) on functions c and g, i.e., as n — )■ oo, 



L(7j,m[J^j,];t,x,m) -> L{uj,mit,T],'t,x,m) 



(51ogE e 



j[9(^j(r))+/t c{s,Xj{s),Uj{s),ms) ds] 



Xj (t) = x,mt = m 
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Based on this limiting cost, we can construct the best response to mean field in the limit. 
Given {rus} s(z[t^T], we minimize L{uj,m[t^T]'-,'t, x,m) subject to the state-dynamics constraints. 

B. Fixed-point problem 

We now define the mean field equilibrium problem as the following fixed-point problem. 

Definition 3. The mean field equilibrium problem (P) is one where each player solves the optimal 
control problem, i.e., 



Xj (t) = x,mt = m 



subject to the dynamics of Xj{t) given by the dynamics in Section III-A where the mean field 
rrit is replaced by ml and fnl is the mean of the optimal mean field trajectory. The optimal 
feedback control u*\t,x,m*] depends on m*, and m* is the mean field reproduced by all the 
u*, i.e., ml = m[t,u*] solution of the Fokker-Planck-Kolmogorov forward equation The 
equilibrium is called an individual feedback mean field equilibrium if every player adopts an 
individual state-feedback strategy. 

Note that this problem differs from the risk- sensitive mean field stochastic optimal control 
problem where the objective is 

with ms[u] the distribution of the state dynamics Xj{s) driven by the control uj. 

C. Risk-sensitive FPK-McV equations 

The regular solutions to problem (P) introduced above are solutions to HJB backward equation 
combined with FPK equation and macroscopic McKean-Vlasov version of the limiting individual 
dynamics, i.e.. 



dxj{t) = I / f{t,Xj{t),u*{t),w)mt{dw)\dt 



+ A/ej^y a{t,Xj{t),u*{t),w)mt{dw) j dMj{t), 
Xj{0) = Xjfi = X 
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= dtv + mi\^f ■d,v+Ur{aa'dl,v) + ^\\ad,vf +cj , 
Xj := x; v{T,x) = g{x) 

dtmt = -Dl^t j f(t,x,u*,w)mt{dw) 

+ -Dl^^^t(^J cr' (t, x,u* ,w)mt{dw)^ ■ a{t, x,u* ,w)mt{dw] 
mo(-) fixed. 

Then, the question of existence of a solution to the above system arises. This is a backward- 
forward system. Very little is known about the existence of a solution to such a system. In 
general, a solution may not exist as the following example demonstrates. 

D. Non-existence of solution to backward-forward boundary value problems 

There are many examples of systems of backward-forward equations which do not admit 
solutions. As a very simple example from [|37l . consider the system: 

i) = m, m = —V, m(0) = mg; v{T) = —rriT- 

It is obvious that the coefficients of this pair of backward-forward differential equations are 
all uniformly Lipschitz. However, depending on T, this may not be solvable for tuq ^ 0. We can 
easily show that for T = /cvr + 37r/4 (k, a nonnegative integer), the above two-point boundary 
value problem does not admit a solution for any mo ^ and it admits infinitely many solutions 
for uiq = 0. 

Following the same ideas, one can show that the system of stochastic differential equations 
(SDEs) 

dv = mdt + adM{t), dm = -vdt + iydM{t), 
where B(t) is the standard Brownian motion in M. With the initial conditions: 

m(0) = mo 7^ 0; v(T) = — my, 

and T = 77r/4, the system of SDEs has no solution. 

This example shows us that the system needs to be normalized and the boundary conditions 
will have to be well posed. In view of this, we will introduce the notion of reduced mean field 



system in Section IV to establish the existence of equilibrium for a specific class of risk-sensitive 
games. 
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E. Risk- sensitive mean-field equilibria 

Theorem 1. Consider a risk- sensitive mean-field stochastic differential game as formulated 
above. Assume that o = a(t) and there exists a unique pair {u*,m*) such that 
( i) The coupled backward-forward PDEs 

dtv* + inf {/* • d^v* + Ur{aa'dlv*) \\ ad^v f +c*| = 0, 

Uj K Z ZO J 

v{T,x) = g{x), ml{x) fixed. 



dtml + DI ym^ J f*{t,x,u* ,w)ml{dw) 

admit a pair a bounded nonnegative solutions v*, m*; and 

(ii) u* minimizes the Hamiltonian, i.e., f{t,x,u,m*) ■ d^v* + c(t, x, m, m*). 

Under these conditions, the pair {u*, m*) is a strongly time-consistent mean-field equilibrium 

and L{t,u* ,m*) = v*. In addition, if c = x'{Q(t) — At(m"))x where A(t, ■) is a measurable 

symmetric matrix-valued function, then any convergent subsequence of optimal control laws 7"*^"'' 

leads to a best strategy for m. 

Proof: See the Appendix. ■ 

Remark 4. This result can be extended to finitely multiple classes of players ( see / l25l/ . /H]/, / |23]/ 
for discussions). To do so, consider a finite number of classes indexed by 9 E Q. The individual 
dynamics are indexed by 9, i.e. the function f becomes fe and o becomes ag. This means that the 
indistinguishability property is not satisfied anymore. The law depends on 9 (it is not invariant 
by permutation of index). However, the invariance property holds within each class. This allows 
us to establish a weak convergence of the individual dynamics of each generic player for each 
class, and we obtain xe(t). The multi-class mean-field equilibrium will be defined by a system 
for each class and the classes are interdependent via the mean field and the value functions per 
class. ■ 

Limiting behavior with respect to e : We scale the parameters 5, e and p such that 5 = 2ep^. 
The PDE given in Proposition [T] becomes 

dtV + inf |r ■ d.,v + y.r{aa'dl,v) + ^ || ad^,v f +c*| = 0, v{T,x) = g{x). 
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When the parameter e goes to zero, one arrives at a deterministic PDE. This situation captures 
the large deviation hmit: 

1 



dtv + inf |r ■ d^v + — II ad^v \\' +c* j = 0, v{T, x) = g{x). 

F. Equivalent stochastic mean-field problem 

In this subsection, we formulate an equivalent (n+1)— player game in which the state dynamics 
of the n players are given by the system (ESM) as follows: 



dx'^{t) = [ f{t,x]{t),u]{t),w)m'l{dw) + aC{t)\dt + y/^a(mj{t), 

) (ESM) 

x^"(0) = x,-o e M^ k> 1, J G {1, . . . ,n}, 
where (^(t) is the control parameter of the "fictitious" {n + 1)— th player. In parallel to we 
define the risk-neutral cost function of the n players as follows: 

Xj{t) = X, m" = m 



E(gix]{T)) + J\is,x]{s),u]{s),m:) ds - J^"^ \\ C(s) f ds 



(10) 

where ( : [0, T] x M'^ — )■ Un+i is the individual feedback control strategy of the fictitious Player 
n + 1 that yields an admissible control action C(t) in a set of feasible actions f/„+i- 

Every player j E {1,2, ... ,n} minimizes L by taking the worst over the feedback strategy ( 
of player n + 1 which is piecewise continuous in t and Lipschitz in Xj. We refer to this game 



described by (ESM) and (10) as the robust mean-field game. In the following Proposition, we 
describe the connection between the mean-field risk-sensitive game problem described in (SM) 
and (|3]) and the robust mean-field game problem described in (ESM) and (10), 



Proposition 6. Under the regularity assumptions (i)-(vii), given a mean field m", the value 
fiinctions of the risk-sensitive game and the robust game problems are identical, and the mean- 
field best-response control strategy of the risk- sensitive stochastic differential game is identical 
to the one for the corresponding robust mean-field game. 

Proof: Let 5" = inf^^ sup(> x", mjj, t, a;j, m) denote the upper-value function 

associated with this robust mean-field game. Then, under the regularity assumptions (i)-(vii). 
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if {}" is in t and 



in X, it satisfies the Hamilton-Jacobi-Isaacs (HJI) equation 



inf sup 



u 



')} = 0, 



(11) 



i)"(T,x,) 



g{xj). 



Note that ( 11 ) can be rewritten as inf^ sup^ H^, where 



:= H + {aO ■ d^^v^ - ||C 



is the Hamiltonian associated with this robust game. 

Since the dependence on u and C above are separable, the Isaacs condition (see flU) holds, 
i.e.. 



Moreover, the optimal cost and the optimal control laws in the two problems are the same. ■ 

Remark 5. The FPK forward equation will have to be modified to include the control of fictitious 
player in the robust mean field game formulation accordingly by including the term a( in ( ESM). 
Hence the mean field equilibrium solutions to the two games are not necessarily identical. 

IV. Linear state dynamics 

In this section, we analyze a specific class of risk-sensitive games where state dynamics are 
hnear and do not depend explicitly on the mean field. We first state a related result from [|24l . 
ifTlll for the risk- neutral case. 




Note that the two PDEs, (12) and the one given in Proposition flj are identical with = ^■ 



October 11, 2012 



DRAFT 



21 



Theorem 2 ( [|24ll ). Consider the reduced mean field system (rMFG): 

d^v + H{x,VxV,mt{x)) + —dl^v = 0, 

9^mt + div{mtdpH{x, V^v, mt{x)) - —dl^rrit = 0, 

mo(-) fixed, v(T, ■) fixed, 
v,m are 1 -periodic, 
X e (0,1)^ := X, 

where H is the Legendre transform (with respect to the control) of the instantaneous cost function. 

Suppose that {x,p,z) i — > H(x,p,z) is twice continuously differentiable with the respect to 
{p, z) and for all {x,p, z) e X y,W 



dl„H{x,p,z) \dl^H{x,p,z) 



2 \-^pz^ 

Then, there exists at most one smooth solution to the (rMFG). 



'MM^.P.z)]' -ldM^,p,z) 



>- 



Remark 6. We have a number of observations and notes. 

• The Hamilitonian function H in the result above requires a special structure. Instead of a 
direct dependence on the mean field distribution rrit, its dependence on the mean field is 
through the value of nit evaluated at state x. 

• For global dependence on m, a sufficiency condition for uniqueness can be found in [23] 
for the case where the Hamiltonian is separable, i.e., H(x,p,m) = ^{x,p) + f{x,m) with 
f monotone in m and ^ strictly convex in p. 

• The solution of (rMFG) can be unique even if the above conditions are violated. Further, 
the uniqueness condition is independent of the horizon of the game. 

• For the linear-quadratic mean field case, it has been shown in [3] that the normalized 
system may have a unique i.i.d. solution or infinitely many solutions depending on the 
system parameters. See also /[J]/ for recent analysis on risk-neutral linear-quadratic mean 
field games. 

■ 

The next result provides the counterpart of Theorem |2] in the risk-sensitive case. It provides 
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Fig. 1. The evolution of distribution m*,0 < t < 5, —19 < a; < 21. 



sufficient conditions for having at most one smooth solution in the risk-sensitive mean field 
system by exploiting the presence of the additive quadratic term (which is strictly convex in p). 

Theorem 3. Consider the risk-sensitive (reduced) mean field system (RS-rMFG). Let 5 > 0, 
and H{x,p, z) be twice continuously dijferentiable in {p, z) eM.'^ x ]R_|_, satisfying the following 
conditions: 

• H is strictly convex in p, 

• H is decreasing in z, 

• (-¥) ■ y {dlH - ^4p/zy . id%H - ^4p/z). 

Then, (RS-rMFG) has at most one smooth solution. 

Proof: See the Appendix. ■ 

Remark 7. We observe that in contrast to Theorem^( risk-neutral case), the sufficiency condition 
for having at most one smooth solution in (RS-rMFG) now depends on the variance term. m 

V. Numerical Illustration 

In this section, we provide two numerical examples to illustrate the risk-sensitive mean-field 
game under affine state dynamics and McKean-Vlasov dynamics. 
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A. Affine state dynamics 

We let Player j's state evolution be described by a decoupled stochastic differential equation 

dx]{t) = Uj{t)dt + y/eaMj{t). 

The risk- sensitive cost functional is given by 



r-T 

L{^j, m"; t, x,m) = 5 log E^;,^ { exp 



.5 

where 5, Q, q are positive parameters; hence coupling of the players is only through the cost. 
The optimal strategy of Player j has the form of 

u*it) = -z{t)x, (13) 

where z{t) is a solution to the Riccati equation 

z{t) + q- E(m") - z'^{t){l - a^p"^) = 0, 

with boundary condition z(T) = Q. An explicit solution is given by 

) \ 

,0<t<T, 



Jq - M r r- I — , / \fLQ 

z{t) = - \- tan yZv^g - M{t - T)+ arctan ' ^ 



where L := 1 — cr"^ / and M := E(m"). The FPK-McV equation reduces to 

dml + d,{mlz{t)x{t)) = '-a'dlm;. 

We set the parameters as follows: q = 1.2,Q = 0.1, 6 = 100, 000, a = 2.0, T = 5 and e = 5.0. 
Let mg(x) be a normal distribution A/'(l, 1) and for every < t < T, vanishes at infinity. 
In Figure [Tj we show the evolution of the distribution and in Figures |2] and [3| we show 
the mean and the variance of the distribution which affects the optimal strategies in ( [T3] ). The 
optimal linear feedback z{t) is illustrated in Figure |4j We can observe that the mean value E(mJ') 
monotonically decreases from 1.0 and hence the unit cost on state is monotonically increasing. 
As the state cost increases, the control effort becomes relatively cheaper and therefore we can 
observe an increment in the magnitude of z(t). However, when the mean value goes beyond 
1.08, we observe that the control effort reduces to avoid undershooting in the state. 
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B. McKean-Vlasov dynamics 

We let the dynamics of an individual player be 



dx]{t) = y^J^x^^t) + u]{t) ) dt + ^eadE^it), (14) 
and take the risk-sensitive cost function to be 



L = 6 log E < exp 



Note that the cost function is independent of other players' controls or states. As n — )• oo, 
under regularity conditions, 

n 

lim^-xr(t) = M(t), 



n— s>oo ' 71 
i=l 



where M(t) is the mean of the population. The feedback optimal control Uj in response to the 
mean field M(t) is characterized by 

uj{t) = -z{t)x,it)-kit), 

where 

z{t) + q - z\l - ay p^) = 0, z{T) = 0, 
k{t) - z{t)k{t) + z{t)M{t) = 0, k{T) = 0, 
and = Y ^"^^ M{t) = J ^.^ xm{x,t)dx. The Fokker-Planck-Kolmogorov (FPK) equation is 

By solving the ODEs, we find that 

z{t) = -v^tan (^v^(t - T)) , < t < T. 

where q = q/{l — cr^/p^) ^^id q = q{l — cr^/p^). Let q = r = 1 and we find the solution 

k{t) = cos(t-T) (^J^ m(r)sec(T - r)tan(T - r)dr - ^ m(r')sec(T - r')tan(T - r')rfr'^ . 

Let a = l, p = 2, (3 = 1 and we show in Figure |5] the evolution of the probability density function 
m{x,t). The mean M{t) and the variance are shown in Figure [6] and Figure |7| respectively. 
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in 



Fig. 5. Evolution of the probability density function m{x, t) 



M(t) 




Fig. 6. The mean M{t) under equilibrium solution 



[Variance 




■ IWfllHIHIHIWIIHIHIHIHIIIIHIHIHHIHIHIHlilMlte 



Fig. 7. Variance over time under equilibrium solution 
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VI. Concluding remarks 

We have studied risk-sensitive mean-field stochastic differential games with state dynamics 
given by an Ito stochastic differential equation and the cost function being the expected value 
of an exponentiated integral. 

Using a particular structure of state dynamics, we have shown that the mean-field limit of the 
individual state dynamics leads to a controlled macroscopic McKean-Vlasov equation. We have 
formulated a risk-sensitive mean-field response framework, and established its compatibility with 
the density distribution using the controlled Fokker-Planck-Kolmogorov forward equation. The 
risk-sensitive mean-field equilibria are characterized by coupled backward-forward equations. 
For the general case, the resulting mean field system is very hard to solve (numerically or 
analytically) even if the number of equations have been reduced. We have, however, provided 
generic explicit forms in the particular case of the affine-exponentiated-Gaussian mean-field 
problem. In addition, we have shovm that the risk-sensitive problem can be transformed into 
a risk-neutral mean-field game problem with the introduction of an additional fictitious player. 
This allows one to study a novel class of mean field games, robust mean field games, under the 
Isaacs condition. 

An interesting direction that we leave for future research is to extend the model to accom- 
modate multiple classes of players and a drift function which may depend on the other players' 
controls. Another direction would be to soften the conditions under which Proposition 5 is 
valid, such as boundedness and Lipschitz continuity, and extend the result to games with non- 
smooth coefficients. In this context, one could address a mean field central limit question on the 
asymptotic behavior of the process y^E (|| x^{t) — Xj{t) ||) . Yet another extension would be 
to the time average risk-sensitive cost functional. Finally, the approach needs to be compared 
with other risk-sensitive approaches such as the mean-variance criterion and extended to the 
case where the drift is a function of the state-mean field and the control-mean field. 
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a, the forward stochastic differential equation has a unique solution adapted to the filtration 
generated by the Brownian motions. We want to show that 



where Ct is a positive number which only depends on the bounds, T and the Lipschitz constants 
of the coefficients of the drifts and the variance term. First we observe that for a fixed control 
u, the averaging terms ^ X^iLi /(^' ^i' ^i) n Sr=i '^(^' ^j; ^! ^i) ^^e measurable, bounded 
and Lipschitz with the respect to the state and uniformly with the respect to time. 

Second, we observe that the bound on the Lipschitz constants of the coefficients do not depend 
on the population size n. 

Hence, / f{t,x,u,x') mt{dx') and J a{t,x,u,x') mt{dx') are bounded and Lipschitz uni- 
formly with the respect to t. Moreover, these coefficients are deterministic. This means that 



Appendix 



Proof of Proposition^ Under the stated standard assumptions on the drift / and variance 



E sup II -Xj{t) 
\te[o,T] 
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there is a unique solution to the limiting SDE and that solution is measurable with the filtration 
generated by the mutually independent Brownian motions. 

Third, we evaluate the gap between the coefficients in order to obtain an estimate of the two 
processes. We start by evaluating the gap 



E 



1 " f 
n ^-^ I 



f{t, X, u, x) mt{dx'] 



Notice that / returns a A;— dimensional vector and x belongs to M^. By reordering the above 
expression (in 2— norm), we obtain 



k / n 

1=1 \ i=l 



, Xj, u,Xi) 1 < -(1 + max 6/)^ < — , 



where var(X) denotes the variance of X and hi is a bound on the /— th component of the drift 
term. (This exists because we have assumed boundedness conditions on the coefficients). 
Following a similar reasoning, we obtain the bounds on the second term in a, i.e.. 



(1 " \ k 

-'^cru'{t,Xj,u,Xi) < -(1 + maxQ;/)^ < 
i=i / 



5 

n 



where q;/ is a bound on the entries (/, /')— of the matrix a. 

Now we use the Lispchitz conditions and standard Gronwall estimates to deduce that the mean 
of the quadratic gap between the two stochastic processes (starting from x at time 0) is in order 
of 

n 

m 

Proof of Theorem [7]- Under the stated regularity and boundedness assumptions, there is 
a solution to the McKean-Vlasov FPK equation. Suppose that (i) and (ii) are satisfied. Then, 
rrit = m*(t,u*{t)) is the solution of the mean-field limit state dynamics, i.e., the macroscopic 
McKean-Vlasov PDE when m is substituted into the HJB equation. By fixing /*, c*, a, we obtain 
a novel HJB equation for the mean-field stochastic game. Since the new PDE admits a solution 
according to (ii), the control u*{t) = u{t, x) minimizing d^v ■ / + c, is a best response to m* at 
time t. The optimal response of the individual player generates a mean-field limit which in law 
is a solution of the FPK PDE and the players compute their controls as a function of this mean- 
field. Thus, the consistency between the control, the state and the mean field is guaranteed by 
assumption (i). It follows that (u*,m*) is a solution to the fixed-point problem i.e., a mean-field 
equilibrium, and a strongly time-consistent one. 

October 11, 2012 DRAFT 



31 



Now, we look at the quadratic instantaneous cost case. In that case, we obtain the risk-sensitive 
equations provided in Proposition 3. The fact that any convergent subsequence of best-response 
to m" is a best response to m* and the fact that u* is an e*— best response to the mean-field limit 
m* follow from mean-field convergence of order 0(4=) and the continuity of the risk-sensitive 



We provide a sufficient condition for the risk- sensitive mean field game to have at most one 
smooth solution. Suppose 5 > 0, and a is positive constant. Let H be the Hamiltonian associated 
with the risk-neutral mean field system. Then the Hamiltonian for the risk-sensitive mean field 
system is H{x,p,m) = H + (^) || p p . Assume that the dependence on m is local, i.e., it is 
function of m(x). 

The generic expression for the optimal control is given by u* = dpH{x, dxV, mt{x)) (note that 
the generic feedback control is expressed in terms of H, and not of H). 

Suppose that there exist two smooth solutions (t>i,mi), (t'2,m2) to the (normalized) risk- 
sensitive mean field system. Now, consider the function t \ — )■ f^^^(v2{x) — Vi{x))(m2{x) — 
mi{x))dx. Observe that this function is at time t = because the measures coincide initially, 
and the function is equal to at time t = T because the final values coincide. Therefore, the 
function will be identically in [0,T] if we show that it is monotone. This will imply that 
the integrand is zero, and hence one of the two terms {v2{x) — Vi{x)) or {m2,t{x) — mi 
should be 0. Then, if the measures are identical, we use the HJB equation to obtain the result. 
If the value functions are identical, we can use the FPK equation to show the uniqueness of 
the measure. Thus, it remains to find a sufficient condition for monotonicity, that is, a sufficient 
condition under which the quantity J^g^({;2(x) — V\[x)){m2[x) —mi(x))dx is monotone in time. 
We compute the following time derivative: 



Ux&X J 

We interchange the order of the integral and the differentiation and use time derivative of a 
product to arrive at; 




quadratic cost functional. 
Proof of Theorem ^ 





October 11, 2012 



DRAFT 



32 



Now we expand the first term A := J^(zxi^t'^>2 — dtVi){m2{x) — mi{x))dx. Consider the two 



HJB equations: 

dtVi + H{x, d^vi, rhi{x)) + ^a'^dlji = 0, 
dtV2 + H{x, dxV2, m2{x)) + ^a^dl^V2 = 

To compute A, we take the difference between the two HJB equations above and multiply by 

1712 — rhi, which gives 

dtV2 - dtVi = -H{x, dxV2, rh2) + H{x, d^Vi, rhi) - Jcr^a^^Va + ^cr'^dl^Vi 



Next we expand the second term B :— j^^^{dtrh2 — dtrhi){v2 — vi)dx. Note that the Laplacian 
terms are canceled by integration by parts in the expression A-\- B. By collecting all the terms 
in A + we obtain 



Hence, 





Letting S{t) — A + B, we introduce 



rhx := (1 — A)mi + Am2 = mi + A(m2 — mi). 
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The measure rhx starts with rhi for the parameter A = and yields the measure m2 for A = 1. 
Similarly define 

vx := (1 - X)vi + \v2- 
Introduce an auxiliary integral parameterized by A. 

Cx := - H{x,dxVx,mx){rhx{x)-mi{x))dx 

J X 

+ 



/ H{x,dxVi,mi){mx{x) — m\{x))dx 

J X 

+ / mxix)[dpH{x,d^vx,rhx)]idxVx - d^vi)dx 

J X 

- / mi{x)[dpH{x,d^Vi,7fii)]{d^vx - d^vi)dx 

J X 



Substituting the terms vx — ^i — \{v2 — vi) and rhx — rhi — A(m2 — mi), we obtain 

C f ~ 

- J H{x,dxVx,mx){m2{x) - rhi{x))dx 

+ / H{x,dxVi,mi){m2{x) — mi{x))dx 

J X 

+ / mx{x)[dpH{x,dxVx,mx)\{dxV2- d^vijdx 

J X 

- / mi{x)[dpH{x,d^Vi,rhi)\{d^V2- d^vijdx 

J X 

Using the continuity of the terms (of the RHS) above and the compactness of A", we deduce 



that 



lim ^ = 0. 

A— )-0 A 



We next find a condition under which the one-dimensional function A i — > ^ is monotone in 
A. We need to compute the variations of 

d\\\ 

Suppose that {x,p,m) \ — > H{x,p,m) is twice continuously differentiable with the respect to 

{p,m). Then, 

d (Cx 



dX \ A 



dpH{x, d^vx, rhx){dxV2 - d^vi) {rh2{x) - mi{x))dx 
drnH{x,dxVx,'rhx){m2{x) - mi{x)) {m2{x) - mi{x))dx 
+ I dx {rhx{x)[dpH{x, d^vx,rhx)]) {d^h - d^vi)dx 



October 11, 2012 



DRAFT 



34 



-^(-^] = - I dpH{x,dxVx,mx){d:ch - d:cVi){m2{x) - mi{x))dx 



X 



- / drnH{x,da:Vx,mx){m2(x) - mi(x))^dx 

J X 

+ / ('1712 - mi)[dpH(x, dj;Vx, mx)](dxV2 - d^vijdx 

J X 

+ / 'mxdx[dpH{x,d^vx,mx)]{d^V2- dxVi)dx 

J X 

Computation of the term mx{x)dx {[dpH{x,dxVx:'rnx)]) yields 

Dx = dx[dpH{x,da:Vx,'mx)] 

= d^p^H.{d,V2-dxV,) + dl^^H.{m2-m,) 



and we obtain 

d (Cx 



dX \ X J J - dxVi){rh2{x) - rhi{x))dx 

- / drnH{x,da;Vx,rhx){rh2{x) - rhi{x)fdx 

J X 

+ / (m2 - rhi)[dpH{x, d^vx, rhx)]{dxV2 - dxVi)dx 

J X 

+ / mxdlpH.{dxV2 - d^vi)^ + mxdl,pH .{m2 - rhi){dxV2 - d^vi) 

J X 

The first and the third lines differ by 
Hence, we obtain 

J mx{dxV2- dxVi,m2- nil) \ \\ \ dx, 



dX \ X 

where 



021 022 / \ m2 — mi 



On dppH, 



^21 - ^{dpmH) - - 2(^P-^) - 
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022 := 



dmH 



m 

Suppose that for all {x,p,m) e A" x R*^ x R+, the matrix 



y 0. 



Oil ai2 

0-21 0-22 

Then, the monotonicity follows, and this completes the proof. 
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